4 research outputs found
Parallel Batch-Dynamic d-trees
d-trees are widely used in parallel databases to support efficient neighborhood and similarity queries. Supporting parallel updates to d-trees is therefore an important operation. In this paper, we present BDL-tree, a parallel, batch-dynamic implementation of a d-tree that allows for efficient parallel -NN queries over dynamically changing point sets. BDL-trees consist of a log-structured set of d-trees which can be used to efficiently insert or delete batches of points in parallel with polylogarithmic depth. Specifically, given a BDL-tree with points, each batch of updates takes ( log2 ( + )) amortized work and (log ( + ) log log ( + )) depth (parallel time). We provide an optimized multicore implementation of BDL-trees. Our optimizations include parallel cache-oblivious d-tree construction and parallel bloom filter construction.
Our experiments on a 36-core machine with two-way hyper-threading using a variety of synthetic and real-world datasets show that our implementation of BDL-tree achieves a self-relative speedup of up to 34.8× (28.4× on average) for batch insertions, up to 35.5× (27.2× on average) for batch deletions, and up to 46.1× (40.0× on average) for -nearest neighbor queries. In addition, it achieves throughputs of up to 14.5 million updates/second for batch-parallel updates and 6.7 million queries/second for -NN queries. We compare to two baseline d-tree implementations and demonstrate that BDL-trees achieve a good tradeoff between the two baseline options for implementing batch updates.M.Eng
ParGeo: A Library for Parallel Computational Geometry
This paper presents ParGeo, a multicore library for computational geometry.
ParGeo contains modules for fundamental tasks including d-tree based spatial
search, spatial graph generation, and algorithms in computational geometry.
We focus on three new algorithmic contributions provided in the library.
First, we present a new parallel convex hull algorithm based on a reservation
technique to enable parallel modifications to the hull. We also provide the
first parallel implementations of the randomized incremental convex hull
algorithm as well as a divide-and-conquer convex hull algorithm in
. Second, for the smallest enclosing ball problem, we propose a
new sampling-based algorithm to quickly reduce the size of the data set. We
also provide the first parallel implementation of Welzl's classic algorithm for
smallest enclosing ball. Third, we present the BDL-tree, a parallel
batch-dynamic d-tree that allows for efficient parallel updates and -NN
queries over dynamically changing point sets. BDL-trees consist of a
log-structured set of d-trees which can be used to efficiently insert,
delete, and query batches of points in parallel.
On 36 cores with two-way hyper-threading, our fastest convex hull algorithm
achieves up to 44.7x self-relative parallel speedup and up to 559x speedup
against the best existing sequential implementation. Our smallest enclosing
ball algorithm using our sampling-based algorithm achieves up to 27.1x
self-relative parallel speedup and up to 178x speedup against the best existing
sequential implementation. Our implementation of the BDL-tree achieves
self-relative parallel speedup of up to 46.1x. Across all of the algorithms in
ParGeo, we achieve self-relative parallel speedup of 8.1--46.61x